Machine learning for query formulation in question answering

نویسنده

  • Christof Monz
چکیده

Research on question answering dates back to the 1960s but has more recently been revisited as part of TREC's evaluation campaigns, where question answering is addressed as a subarea of information retrieval that focuses on specific answers to a user's information need. Whereas document retrieval systems aim to return the documents that are most relevant to a user's query, question answering systems aim to return actual answers to a users question. Despite this difference, question answering systems rely on information retrieval components to identify documents that contain an answer to a user's question. The computationally more expensive answer extraction methods are then applied only to this subset of documents that are likely to contain an answer. As information retrieval methods are used to filter the documents in the collection, the performance of this component is critical as documents that are not retrieved are not analyzed by the answer extraction component. The formulation of queries that are used for retrieving those documents has a strong impact on the effectiveness of the retrieval component. In this paper, we focus on predicting the importance of terms from the original question. We use model tree machine learning techniques in order to assign weights to query terms according to their usefulness for identifying documents that contain an answer. Term weights are learned by inspecting a large number of query formulation variations and their respective accuracy in identifying documents containing an answer. Several linguistic features are used for building the models, including part-of-speech tags, degree of connectivity in the dependency parse tree of the question, and ontological information. All of these features are extracted automatically by using several natural language processing tools. Incorporating the learned weights into a state-of-the-art retrieval system results in statistically significant improvements in identifying answer-bearing documents.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Model Tree Learning for Query Term Weighting in Question Answering

Question answering systems rely on retrieval components to identify documents that contain an answer to a user’s question. The formulation of queries that are used for retrieving those documents has a strong impact on the effectiveness of the retrieval component. Here, we focus on predicting the importance of terms from the original question. We use model tree machine learning techniques in ord...

متن کامل

دسته‌بندی پرسش‌ها با استفاده از ترکیب دسته‌بندها

Question answering systems are produced and developed to provide exact answers to the question posted in natural language. One of the most important parts of question answering systems is question classification. The purpose of question classification is predicting the kind of answer needed for the question in natural language. The  literature works can be categorized as rule-based and learning...

متن کامل

Distributed NLP and Machine Learning for Question Answering Grid

We regard question answering as Semantic Grid application in which the answering process is best expressed as a distributed computing task and show, how the workflow control of this distributed QA task can be learned automatically. Since the control protocol contains information about each resource and service, this information can be mined to reveal semantics about the components. We address t...

متن کامل

Investigating Embedded Question Reuse in Question Answering

The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...

متن کامل

Chunking-based Question Type Identification for Multi-Sentence Queries

This paper describes a technique of question type identification for multi-sentence queries in open domain question-answering. Based on observations of queries in real question-answering services on the Web, we propose a method to decompose a multi-sentence query into question items and to identify their question types. The proposed method is an efficient sentence-chunking based technique by us...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Natural Language Engineering

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2011